Data Quality in Data Warehouses

نویسنده

  • William E. Winkler
چکیده

Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up duplicates and resolving other anomalies. This article provides an overview of two methods for improving quality. The first is data cleaning for finding duplicates within files or across files. The second is edit/imputation for maintaining business rules and for filling in missing data. The fastest data-cleaning methods are suitable for files with hundreds of millions of records (Winkler, 1999b, 2003b). The fastest edit/imputation methods are suitable for files with millions of records (Winkler, 1999a, 2004b).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Data Cleaning in Data Warehouses

It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when perfor...

متن کامل

Data Quality Management in Web Warehouses using BPM

The increasing amount of data published on the Web poses the new challenge of making possible the exploitation of these data by different kinds of users and organizations. Additionally, the quality of published data is highly heterogeneous and the worst problem is that it is unknown for the data consumer. In this context, we consider Web Warehouses (WW) (Data Warehouses populated by web data so...

متن کامل

Design and Analysis of Quality Information for Data Warehouses

Data warehouses are complex systems that have to deliver highly-aggregated, high quality data from heterogeneous sources to decision makers. Due to the dynamic change in the requirements and the environment, data warehouse system rely on meta databases to control their operation and to aid their evolution. In this paper, we present an approach to assess the quality of the data warehouse via a s...

متن کامل

Using Time Series to Assess Data Quality in Telecommunications Data Warehouses

The growing complexity of telephone services, particularly in mobile telephony, and its impact upon billing data mean that phone call volume modelling techniques became crucial in the assessment of the accuracy of the information available in telecommunications data warehouses. Time series modelling, normally used for forecasting, provide a suitable tool for this purpose as well. Preliminary ex...

متن کامل

Methodological Guidelines and Adaptive Statistical Data Validation to Build Effective Data Warehouses

Over time, data integration involving data warehouses is becoming more difficult to develop and to manage due to the growing heterogeneity of data sources. Despite the significant advances in research and technologies, many integration projects are still too slow to generate pragmatic results and are often abandoned before that. The objective of this work is the specification of a developing st...

متن کامل

Repository Support for Data Warehouse Evolution

Data warehouses are complex systems consisting of many components which store highlyaggregated data for decision support. Due to the role of the data warehouses in the daily business work of an enterprise, the requirements for the design and the implementation are dynamic and subjective. Therefore, data warehouse design is a continuous process which has to reflect the changing environment of a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009